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ABSTRACT 


*  ih‘>  i -t /tori 

The  objective  of^t-he— final  phase  -of  Contract  Af3<T(602)-2&t6  was  to 
determine  the  effect  of  various  types  of  Indexing  aids  on  the  minimum 
reliability  of  Indexers,  a-s-  de  te  rm  i-ned-and  desc  r  I  bed  earl  f»r  TnTecWI  ca  I 
♦tote  RAflC-XOR- 6 2 --4 26^  The  three  types  of  tools  tested  as  Indexing  aids 
on  a  collection  of  randomly  selected  chemical  patents  were  a  classlfica- 
tory  device  (Manual  of  Classification  of  the  U.  S.  Patent  Office),  an 
alphabetical  subject-authority  list  of  terms  (Chemical  Patents  Code  List 
of  Documentation  Incorporated),  and  a  concept-associative  tool  (Chemical 
Engineering  Thesaurus  of  the  American  Institute  of  Chemical  Engineers). 
The  former  two  tools  registered  a  highly  significant  improvement  of  the 
"base  zero"  i nter- indexer  consistency;  the  concept-association  aid,  on 
the  other  hand,  failed  to  show  any  effect.  The  analyses  and  interpreta¬ 
tion  of  the  results  indicate  that  an  improvement  in  indexer  reliability, 
and  hence  in  the  quality  of  indexing,  can  be  brought  about  through  the 
use  of  prescriptive,  rather  than  suggestive,  vocabularies  which  formalize 
the  relationships  among  terms  so  as  to  invariably  enjoin  the  indexer's 
assignment  of  index  terms.  Indexing  aids  which  display  numerous  variable, 
ill-defined  relationships  among  terms  appear  to  be  acting  in  the  opposite 
di rection. 
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CHAPTER  I 


INTRODUCTION 

The  overall  aim  of  the  studies  carried  out  under  Contract 
A F30 (602 ) - 26 1 6  is  an  improvement  in  the  quality  of  library  indexing. 

In  its  broad  sense,  the  phrase  "quality  of  indexing"  probably 
entails  numerous  factors  ranging  from  the  problem  of  the  correct  under¬ 
standing  of  the  subject  matter  in  documents  to  the  constraints  inherent 
in  individual  indexing  systems;  for  the  purposes  of  this  study,  however, 
quality  of  indexing  is  equated  with  "reliability  of  indexing." 

Reliability  of  indexing  refers  to  the  consistency  with  which  indexers 
tend  to  choose  the  same  terms  as  descriptive  of  the  same  documents. 

Inter- indexer  reliability  refers  to  the  consistency  among  indexers; 
i ntra- i ndexer  reliability  refers  to  the  measurement  of  the  degree  to 
which  any  one  indexer  tends  to  repeatedly  choose  the  same  index  terms 
for  the  same  document,  with  the  possibility  of  memory  of  past  performance 
influencing  a  given  judgment  accounted  for  and  excluded.* 

The  assumption  that  the  problem  of  indexing  quality  rests,  at 
least  to  an  important  degree,  with  the  determination  and  improvement  of 
the  level  of  reliability  of  indexing,  may  be  validated  by  the  following 
illustration,  in  which  information  retrieval  is  presented  in  terms  of 

'Statement  of  Work,  Contract  AF30(602)-26l6. 
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matrix  algebra. ^  If  the  union  of  two  retrieval  terms,  A  and  B,  is 
described  as  the  trace  of  the  union  A  and  B  operating  on  the  storage 
matrix,  and  the  intersection  of  A  and  B  is  described  as  the  determinant 
of  the  union  A  and  B,  it  is  possible,  in  combination  with  the  probability 
theory,  to  calculate  the  retrieval  probability  of  documents  indexed. 

For  example:  Let  it  be  assumed  that  one  document  is  indexed  by  three 
indexers,  each  of  whom  uses  seven  terms  in  such  a  way  that  the  total 
vocabulary  consists  of  ten  indexing  terms.  Let  there  be  a  50  per  cent 
consistency  between  each  pair  of  indexers,  and  a  40  per  cent  consistency 
among  all  three  indexers.  The  probability  of  retrieving  that  document 
by  searching  under  any  given  number  of  the  ten  terms  is  apparent  from 
Figure  i.  It  may  be  seen  that  the  probability  of  document  retrieval,  in 
the  situation  described,  increases  with  a  higher  i nter- i ndexe r  consistency. 

The  immediate  objective  of  the  present  contract  was  to  determine 
the  amount  of  inter-  and  i nt ra- i ndexe r  consistency  under  two  sets  of 
conditions.  Phase  I  of  the  studies  sought  to  determine  the  amount  of 
agreement  attained  by  indexers  who  have  had,  during  indexing,  no  recourse 
to  look-up  tools  and  indexing  aids  other  than  the  indexing  rules;  Phase 
II  has  attempted  to  determine  the  effect  of  indexing  tools  on  inter¬ 
indexer  consistency.  There  has  been  no  endeavor  under  this  contract  to 
study  the  underlying  assumption  (that  indexing  reliability  affects 
retrieval  efficiency).  Phase  I  results  were  reported  in  two  earlier 
Technical  Notes. 

^The  concept  was  developed  by  Mr.  Wolf  Kuebler  of  Documentation 
I ncorpo  rated . 
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The  first  of  these3  outlined  the  methodology  for  assessing  Inter- 
and  Intra-indexer  consistency  under  minimal  conditions.  The  Indexing 
system  chosen  was  the  Uniterm  system  of  coordinate  Indexing,  employed 
under  conditions  which  were  somewhat  artificial  and  rigidly  regulated  to 
exclude  any  factors  which  might  raise  the  Base  Zero  value.  The  popula¬ 
tion  of  documents  selected  for  the  Inter- Indexer  consistency  test  was  a 
stratified  random  sample  of  75  chemical  patents.  All  Indexers  were 
required  to  Index  the  title  and  claims  of  each  patent;  the  remainder  of 
the  document  was  Indexed  according  to  their  best  judgment.  In  this 
manner,  a  comparison  was  sought  between  the  Indexers'  consistency  In 
rigidly  defined  document  areas,  and  that  in  situations  allowing  a  freedom 
of  indexing  choice. 

Special  conditions  were  necessary  for  the  Intra- indexer  reliability 
test  to  control  the  problem  of  Indexer's  recall,  or  memory.  The  sample 
of  patents  consisted  of  three  batches  of  “equated"  documents,  each  having 
75  patents,  on  the  assumption  that  the  test  results  on  one  batch  would 
equal  those  on  any  of  the  other  two  batches. 

The  test  subjects  consisted  of  two  groups  of  Indexers  numbering 
three  experienced  and  3  beginner  indexers. 

The  hypotheses  tested  In  Phase  1  were  the  following:  (l)  there  is 
no  significant  difference  among  the  Indexers  in  the  number  of  terms  used 
to  describe  each  section  of  a  given  patent;  (2)  there  is  no  significant 
difference  among  the  indexers  In  the  percentage  of  terms  eny  one  Indexer 
has  with  any  other  Indexer;  and  (3)  there  is  no  significant  difference 

^J.  Jacoby,  Methodology  for  Indexer  reliability  tests. 

RADC-TN-62-1  (Bethesda,  Md.,  Documentation  Incorporated,  March  1962). 
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present  in  the  indexing  of  an  equated  patent  by  any  one  indexer  in 
either  the  number  of  terms  used  or  the  percentage  of  terms  used. 

The  measure  of  inter-indexer  consistency  was  defined  by  two 
criteria:  (I)  the  number  of  terms  used  by  each  indexer  per  document, 
and  (2)  the  percentage  of  matched  terms  employed  by  the  experienced  and 
inexperienced  indexers  to  index  a  document. 

The  actual  results  of  Phase  I  experiments  were  described  in  the 
second  Technical  Note.**  They  are  summarized  in  the  following  paragraphs. 

When  the  number  of  index  terms  was  used  as  a  measure  for  evalua¬ 
tion,  there  was  found  a  lack  of  inter- indexer  consistency.  The 
experienced  indexers  showed  less  variation  than  their  inexperienced 
colleagues,  and  a  defined  indexing  area  yielded  more  stability  than  an 
unbounded  area.  A  significant  difference  was  determined  in  the  number 
of  terms  assigned  to  chemical  patents  by  individual  indexers,  thus 
refuting  the  relevant  hypothesis  of  this  experiment.  With  regard  to  the 
percentage  of  terms  matched,  the  experienced  indexers  have  attained  a 
significantly  higher  degree  of  inter- indexer  consistency  than  the 
inexperienced  indexers,  with  less  internal  variation.  Both  groups  of 
indexers  attained  a  higher  degree  of  consistency  in  the  bounded  section 
of  patents. 

Upon  integrating  the  results  pertinent  to  these  criteria  of  inter¬ 
indexer  consistency,  it  was  concluded  that  whereas  there  was  3  large 
amount  of  individual  variation,  there  exists  a  significant  difference  in 
consistency  when  experienced  indexers  are  compared  to  beginners.  The 

^J.  Jacoby  and  V.  Slamecka,  Indexer  consistency  under  minimal 
condi t ions,  RADC-TDR-62-426  (Bethesda,  Md.,  Documentation  Incorporated, 
November  1962). 
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experienced  indexers  use  less  terms  and  exhibit  a  higher  stability  in 
their  choice  of  terms.  Furthermore,  the  number  of  terms  used  to  index  a 
bounded,  or  defined,  area  of  documents  is  lesser,  and  the  percentage  of 
consistency  higher,  than  when  indexing  an  unbounded,  or  unspecified, 
area  of  documents. 

The  results  of  the  intra-indexer  reliability  tests  have  shown  that 
on  the  whole,  indexers  tend  to  be  consistent  when  they  re- i ndex  equated 
patents  using  a  general  term  vocabulary.  Inexperienced  Indexers  tend  to 
be  more  consistent  upon  re- indexing  an  equated  document  in  the  bounded 
section.  Of  particular  interest  was  the  fact  that  the  two  highest  levels 
of  consistency  were  attained  by  inexperienced  indexers.  Since  the  results 
of  the  intra- indexer  consistency  tests  were  somewhat  qualified  by  the 
methodology  employed,  it  was  agreed  that  this  particular  phase  of  the 
investigations  would  be  dropped  from  the  Phase  II  studies. 

The  purpose  of  Fhase  II  has  been  to  investigate,  theoretically  and 
experimentally,  the  possibilities  of  improving  indexer  reliability,  as 
determined  in  Phase  I,  through  the  use  of  various  indexing  aids  designed 
to  overcome  the  limitations  of  indexer  memory,  and  to  provide  a  feedback 
relative  to  the  usefulness  of  indexing  operations.  The  theoretical  study 
was  published  in  a  Final  Technical  Report  entitled  "Indexing  Aids"  which 
analyzed  classif icatory  schedules,  alphabetical  vocabularies,  and 
concept-association  lists  from  the  viewpoint  of  their  utility  in 
coordinate  indexing  as  aids  prescribing  invariable,  and  suggesting 
variable  relationships  among  indexing  terms. 5 

5\/.  Slamecka,  Indexing  aids,  RADC-TDR-62-579  (Bethesda,  Md., 
Documentation  Incorporated,  January  1963).  See  also  V.  Slamecka, 

"Classi f icatory,  Alphabetical,  and  Associative  Schedules  as  Aids  in 
Coordinate  Indexing,"  American  Documentation,  Vol .  14,  No.  3  (July  1963). 
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With  a  view  toward  selecting  indexing  aids  for  testing  in  the 
experimental  Phase  II,  the  following  observations  were  tentatively  drawn 
in  this  Report: 

(1)  Hierarchical  classification  schedules,  including  faceted 
schedules,  are  helpful  to  the  indexer  in  his  analysis  of  subject  matter. 
The  rigidity  of  hierarchical  approach  renders  them  only  partially  useful 
as  aids  in  coordinate  indexing;  by  the  same  token,  however,  they  appear 
conducive  to  greater  indexer  reliability. 

(2)  Alphabetical  lists  of  terms  usually  contain  only  a  limited 
number  of  cross  references  among  terms.  When  compiled  from  the  terms 
freely  used  by  indexers,  they  can  become  useful  guides  to  indexers  only 
after  they  have  been  carefully  edited;  and  when  provided  with  cross 
references  for  the  invariable  relations  among  terms,'  they  may  serve  as 
indexing  authorities,  and  thus  be  conducive  to  improved  indexer 

cons i stency . 

(3)  Concept-association  lists  differ  f rom  a  1 phabet ica 1  lists  in 
that  they  also  display,  in  addition  to  the  invariable  relations  among 
terms,  a  great  number  of  variable  relationships  of  all  types.  The 
numerous  displays  of  variable  relationships  increase  the  range  of  the 
indexer's  choice  of  terms  by  suggesting  to  him  additional  possible 
indexing  terms;  as  a  result,  concept-association  displays  are  unlikely 
to  improve  the  consistency  of  indexing. 

The  Final  Technical  Report  concluded  that  indexing  aids  which 
prescribe  to  the  indexer  the  invariable  term  selections  are  conducive  to 
better  indexer  consistency;  on  the  other  hand,  the  higher  in  a  tool  the 
number  of  variable  cross  references  which  can  be  employed  by  indexers 
only  as  suggestions,  the  lower  the  probable  value  of  these  tools  for 
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improvement  of  indexing  consistency.  The  Report  also  outlined  the 
methodology  for  an  experimental  Investigation  in  which  the  effect  of 
indexing  aids  on  indexer  reliability  would  be  studied.  This  methodology, 
and  the  ensuing  experimental  results,  are  summarized  in  Chapter  II. 


CHAPTER  II 


THE  EFFECT  OF  INDEXING  AIDS  ON  I NTER- I NDEXER 
CONSISTENCY:  EXPERIMENTAL  RESULTS 

The  indexing  aids  tested  in  Phase  II  were  selected  on  the  basis  of 
their  availability,  their  applicability  to  the  document  collection 
(chemical  patents)  and  to  the  system  of  indexing  (Uniterm  system  of 
coordinate  indexing),  by  economic  considerations,  and  by  the  preceding 
theoretical  analysis  in  Final  Technical  Report  RADC-TDR-62-579 .  Drawn 
from  the  alphabetical,  associated,  and  hierarchical  types  of  indexing 
aids,  the  tools  selected  were,  respectively,  the  Chemical  Patents  Code 
Manual  of  Documentation  Incorporated  (hereafter  referred  to  as  the 
Vocabulary),  the  Chemical  Engineering  Thesaurus  of  the  American  Institute 
of  Chemical  Engineers,  and  the  Manual  of  Classification  of  the  United 
States  Patent  Office.  Following  a  brief  outline  of  the  methodology 
employed,  the  results  of  Phase  II  experiments  are  noted  and  compared  to 
the  results  of  the  Base  Zero  test,  and  the  amount  of  change  in  inter¬ 
indexer  reliability  evaluated. 


Methodo I ogy 

Each  of  the  three  tests,  using  one  of  the  three  indexing  aids  named 
above,  consisted  of  indexing  of  25  chemical  patents  selected  randomly  to 
eliminate  bias  due  to  subject  matter  and  degree  of  complexity.  The 
indexing  instructions  (see  Appendix  III)  required  each  of  the  three 
experienced  indexers  to  index  the  documents  to  the  best  of  his  ability, 
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noting  separately  the  terms  chosen  for  the  bounded  and  unbounded 
sections,  and  utilizing  simultaneously  or  subsequently  one  of  the  three 
indexing  tools  to  withdraw  or  complement  his  selection  of  indexing  terms. 
Each  such  change  in  the  selection  of  terms  caused  by  his  consulting  of 
the  aid  was  to  be  noted  by  the  indexer. 

The  independent  variables  evaluated  are:  (l)  the  Indexers,  (2)  the 
sections  of  the  patent,  and  (3)  the  patents.  The  dependent  variables, 
which  measure  the  amount  of  reliability  of  consistency,  are:  (l)  the 
number  of  terms  used,  and  (2)  the  percentage  of  terms  matched  between 
any  two  indexers.  The  hypotheses  are  given  with  the  results  of  these 
individual  tests. 

A  special  analysis  was  proposed  for  the  "term  origin"  for  each 
tool,  as  indicated  by  indexers  in  the  following  five  categories: 

(1)  original  term  found  in  indexing  aid  and  retained;  (2)  original  term 
not  found  in  indexing  aid  but  retained;  (3)  original  term  rejected  upon 
inspection  of  the  aid,  whether  found  or  not,  and  no  new  term  substituted; 
w  original  term  rejected  upon  inspection  of  the  aid,  and  a  new  term 
substituted;  and  (5)  term  suggested  by  and  adopted  from  the  indexing  aid. 
The  analysis  of  these  categories  hoped  to  determine  quantitatively  the 
nature  of  the  effect  of  each  indexing  tool  on  indexer  consistency  (that 
is,  which  of  the  categories  had  the  most  pronounced  effect);  as  will  be 
shown,  the  results  are  not  complete  because  two  of  the  three  indexers 
failed  to  indicate  properly  the  respective  categories  they  used. 

Number  of  Terms  Used 

As  a  measurement  of  the  relative  effect  of  each  indexing  aid  on 
the  volume  of  indexing,  the  number  of  terms  used  to  index  each  patent 
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was  analyzed.  It  was  hoped  that  if  these  indexing  aids  generated 
substantially  different  amounts  of  terms,  some  conclusions  could  be 
drawn  concerning  their  quantitative  characteristics.  Yet  it  must  be 
noted  that  as  a  single  measure  of  indexer  consistency,  the  number  of 
terms  is  not  a  proper  criterion;  rather,  it  is  a  Huanti tative  indication 
of  the  volume  of  indexing  which  is  generated  by  the  use  of  these  aids. 

Three  hypotheses  were  tested: 

(1)  There  is  no  significant  difference  among  indexers,  patent 
sections,  and  patents  with  respect  to  the  number  of  terms  used  to  index 
a  document; 

(2)  There  is  no  significant  difference  between  the  test  results 
of  indexing  under  minimal  conditions  (hereafter  referred  to  as  the  Base 
Zero  Test)  and  the  results  recorded  by  the  use  of  any  given  indexing 
tool ; 

(3)  There  is  no  significant  difference  in  the  test  results 
recorded  by  all  indexing  tools. 

Resul ts 

Tables  1-3  indicate  the  significant  variables  which  resulted  from 
the  analyses  of  variance  (see  Appendix  II)  of  the  three  indexing  aids 
and  the  Base  Zero  Test.  Table  4  summarizes  the  results  obtained  by  the 
indexers  under  each  test  condition. 

The  following  main  points  summarize  the  results  obtained  in  these 


analyses: 
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TABLE  I 

NUMBER  OF  TERMS  USED:  SIGNIFICANT  VARIABLES 
ACTING  WITHIN  EACH  INDEXING  AID 


1  tem 

Base 

Zero 

Thesaurus 

Vocabu 1  a  ry 

Patent  Office 
Class  i  f icat ion 

1 ndexers 

X 

Sections 

X 

X 

X 

X 

Patents 

X 

X 

- 

_ 

Indexer/Sections 

- 

X 

X 

X 

1 ndexer/Patents 

- 

X 

X 

- 

Section/Patents 

X 

X 

X 

X 

NUMBER 

FROM 

TABLE  2 

OF  TERMS  USED:  SIGNIFICANT  VARIABLES 
COMPARISON  OF  EACH  INDEXING  AID 

WITH  BASE  ZERO  TEST 

1  tem 

Thesaurus 

Vocabulary 

Patent  Office 
Classification 

Aid/Patents 

. 

X 

A  i  d/ 1 ndexe  r/Sec  t i on  s 

X 

X 

X 

Aid/Section/Patents 

X 

“ 

X 

TABLE  3 


NUMBER  OF  TERMS  USED:  S 
FROM  COMPARISON  OF 


GNIFICANT  VARIABLES 
INDEXING  AIDS 


Ai d/lndexers 
Aid/Patents 
A  id/ Indexers /Pa  tents 
Ai d/Sect  i  on/ Pa  tents 
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TABLE  4 

NUMBER  OF  TERMS  USED:  RESULTS  OF  TESTS,  AND  SUMMARY 


1  tem 

Base 

Zero 

Thesaurus 

Vocabulary 

Patent  Office 
Classi f ication 

Avg,  no.  terms  used  by 

indexers  per  patent  •  •  •  • 

37.8 

46.4 

50.5 

43.4 

Avg.  no.  terms  used  by 
indexers  per  section 

Bounded  . 

13.4 

18.7 

19.3 

16.1 

Unbounded  . 

24.4 

27.7 

31 .2 

27.3 

Avg.  no.  terms  used  per 
i ndexer 

Indexer  1  . 

34.3 

56.5 

56.5 

47.4 

Indexer  2  . 

34.8 

36.3 

38.2 

35.7 

Indexer  3  . 

44.2 

46.3 

56.8 

47.1 

Avg.  no.  terms  used  per 
indexer  per  section 

Bounded 

Indexer  1  . 

11.1 

25.6 

24.6 

19.1 

Indexer  2  . 

12.1 

13.7 

13.3 

12.9 

Indexer  3  . 

16.9 

16.8 

19.8 

16.5 

Unbounded 

Indexer  i  . 

23.2 

30.9 

31.9 

28.4 

Indexer  2  . 

22.6 

22.6 

24.8 

22.8 

Indexer  3  . 

27.3 

29.5 

36.9 

30.6 

Avg.  variation  between 
patents  for  each  indexer 
(standard  deviations) 

Indexer  1  . 

10. 1 

13.2 

15.2 

17.8 

Indexer  2  . 

7.7 

9.8 

10.2 

10.4 

Indexer  3  . 

9.3 

12.4 

19.9 

16.2 

Avg.  variation  between 
patents  per  section 
(standard  deviations) 

Bounded  . 

7.3 

10.5 

20.0 

20.1 

Unbounded  . 

8.9 

12.8 

24.5 

28.9 

Avg.  variation  between 
patents  per  tool 
(standard  deviations)  .  .  . 

28.2 

47.9 

39.4 

40.3 
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(1)  Table  4  shows  that  the  largest  average  number  of  terms  used 
to  index  a  patent  was  generated  by  the  use  of  the  Vocabulary,  and  the 
smallest  in  the  Base  Zero  Test.  The  differences  between  the  test  condi¬ 
tions  are,  however,  not  substantial  enough  to  be  statistically 

s igni f i cant . 

(2)  Of  significance  is  the  fact  that  with  all  the  indexing  aids 
tested,  about  62  per  cent  of  the  terms  used  per  patent  were  allocated  to 
the  unbounded  section  of  the  patent.  Not  only  were  more  terms  used  in 
this  section,  but  a  greater  range  in  the  number  of  terms  used  for  this 
section  was  noted  by  the  larger  amount  of  variation  present. 

(3)  Only  the  Thesaurus  yielded  a  wide  variation  in  the  volume  of 
indexing  per  patent  regardless  of  section.  All  other  tools  showed  no 
such  great  variability  because  of  the  canceling  effect  of  the  relative 
consistency  in  the  bounded  section  of  the  patents.  When  these  aids  were 
compared  to  the  Base  Zero  Test,  however,  only  the  Patent  Office  Classi¬ 
fication  list  differed  substantially  from  the  amount  of  variation  among 
patents  present  under  this  minimal  condition.  When  these  aids  were 
compared  to  each  other,  on  the  other  hand,  they  all  differed  signifi¬ 
cantly,  the  most  variable  being  the  Thesaurus,  the  least  the  Vocabulary. 

(4)  Indexer  consistency  under  each  test  condition  was,  on  the 
average,  good,  although  differences  were  noted  between  the  indexing  aids. 
The  trend  was  as  follows:  all  indexers  used  the  highest  number  of  terms 
per  patent  when  using  the  Vocabulary,  the  second  largest  number  of  terms 
with  the  Thesaurus,  and  the  third  with  the  Patent  Office  Classification 
manual.  All  indexers  tended  to  be  more  variable  in  the  range  of  the 
number  of  terms  they  used  in  the  unbounded  section  than  that  in  the 
bounded  section.  Finally,  the  Vocabulary  and  Patent  Office 
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Classification  list  were  the  only  two  aids  which  resulted  in  the  same 
ranking  of  indexers  in  both  sections:  Indexer  1  using  the  largest  number 
of  terms,  Indexer  2  using  the  fewest  terms. 

(5)  The  standard  deviation  from  the  average  number  of  terms  used 
by  each  indexer  indicates  the  amount  of  variation  he  used  in  assigning 
terms  to  each  of  the  patents.  The  results  of  this  examination  show  that 
the  indexer  who  used  the  largest  number  of  terms  per  patent  also  tended 
to  be  most  variable  in  the  range  of  these  terms,  except  in  the  Base  Zero 
Test,  and  that  he  tended  to  be  more  variable  in  the  unbounded  section 
than  in  the  bounded. 

Concl us i ons 

The  conclusions  which  can  be  drawn  from  this  phase  of  the  test 
concern  the  differences  in  the  quantitative  characteristics  generated  by 
the  three  types  of  indexing  aids. 

(1)  The  Thesaurus,  the  aid  with  the  highest  number  of  suggestive 
cross  references,  generated  slightly  less  terms  than  the  Vocabulary. 

The  lowest  number  of  terms  generated  by  the  Patent  Office  Classification 
manual  does  not  represent  a  significant  decrease  over  the  other  tools. 

(2)  There  is  generally  good  indexer  consistency  within  each 
indexing  aid,  although  when  the  aids  are  compared  there  is  a  definite 
difference  between  them  with  regard  to  the  number  of  terms  used  by  an 
indexer;  the  more  terms  used,  the  greater  the  variability. 

(3)  The  largest  and  most  striking  difference  which  occurs  within 
each  aid  and  in  comparison  to  each  other  is  that  between  the  bounded  and 
the  unbounded  sections.  It  appears  that  if  the  section  to  be  indexed  is 
clearly  defined,  fewer  terms  are  used  by  the  indexers,  regardless  of 
indexing  aid,  and  less  variability  is  encountered  in  the  number  of  terms 
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used  per  patent. 

(4)  The  patents  vary  in  length,  and  so  does  the  number  of  terms 
deemed  necessary  by  the  indexer  to  describe  them.  Again,  where  the 
largest  number  of  terms  is  the  standard,  so  is  the  amount  of  variation 
from  patent  to  patent. 

In  general,  when  the  tools  are  compared,  there  was  less  variability 
in  the  number  of  terms  used  in  the  Base  Zero  Test  and  the  Thesaurus,  but 
as  a  rule  a  higher  consistency  of  trend  with  the  Vocabulary  and  the 
Patent  Office  Classification  manual. 

Percentage  of  Terms  Matched 

The  criterion  of  evaluation  here  was  the  percentage  of  terms 
matched  between  pairs  of  indexers.  The  primary  purpose  of  this  part  of 
the  investigation  was  to  attempt  to  ascertain  whether  any  differences  in 
the  percentage  of  terms  matched  by  these  indexers  could  be  attributed  to 
the  indexing  tool  they  utilized.  The  secondary  aim,  also  included  in 
this  section,  was  the  measurement  of  the  differences  which  occurred 
between  matches  in  different  types  of  indexed  areas--those  which  were 
specified  and  those  in  which  the  indexer  was  allowed  to  choose  which 
item  he  desired  to  index.  Also,  a  test  was  made  to  discover  whether  any 
of  these  indexing  aids  led  to  greater  consistency  among  the  indexers. 

All  three  aids  were  compared  to  the  Base  Zero  Test  to  note  any  differences. 

In  the  Base  Zero  Test,  the  differences  compared  were  between  two 
groups  of  indexers,  the  experienced  and  the  inexperienced.  In  Phase  II, 
where  there  no  longer  existed  an  inexperienced  group,  the  differences 
examined  were  those  between  pa i rs  of  indexers.  This  does  not  invalidate 
the  conclusions  in  Phase  I,  since  there  was  no  significant  difference 
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between  the  Indexer  pairs  with  respect  to  the  percentage  of  terms 
matched  (and  hence  the  experienced  indexers  could  be  treated  In  Phase  I 
as  a  homogeneous  group).  The  Phase  II  tabulations  for  the  Base  Zero 
Test  thus  amplify  those  of  Phase  I. 

The  following  three  hypotheses  were  tested: 

(1)  There  is  no  significant  difference  between  Indexersi  patent 
sections,  and  patents  with  respect  to  the  percentage  of  terms  matched 
between  the  indexers; 

(2)  There  is  no  significant  difference  between  the  test  results 
of  indexing  under  minimal  conditions  and  the  results  recorded  by  the  use 
of  any  given  indexing  tool; 

(3)  There  is  no  significant  difference  in  the  test  results 
recorded  by  all  indexing  tools. 

Results 

A  summary  of  the  significant  variables  as  determined  by  analysis 
of  variance  techniques  (see  Appendix  II)  is  presented  in  Tables  5-7, 
followed  by  a  summary  of  the  results  in  Table  8. 


TABLE  5 

PERCENTAGE  OF  TERMS  MATCHED:  SIGNIFICANT  VARIABLES 
ACTING  WITHIN  EACH  INDEXING  AID 


1  tern 

Base 

Zero 

Thesaurus 

Vocabulary 

Patent  Office 
Class i f i cation 

Indexers 

X 

X 

Sections 

- 

- 

X 

X 

Patents 

- 

X 

- 

• 

1  ndexer/Sect ion 

X 

X 

- 

- 

Indexer/Patents 

X 

X 

X 

X 

Section/Patents 

“ 

“ 

X 

X 
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TABLE  6 

PERCENTAGE  OF  TERMS  MATCHED:  SIGNIFICANT  VARIABLES 
FROM  COMPARISON  OF  EACH  INDEXING  AID 
WITH  BASE  ZERO  TEST 


1  tern 

Thesaurus 

Vocabulary 

Patent  Office 
Classification 

Test  Conditions  (TC) 

. 

X 

X 

TC/Section 

- 

X 

X 

TC/ 1  ndexer/P a  tents 

- 

X 

X 

TC/Sect ion/Patents 

• 

X 

X 

TABLE  7 

PERCENTAGE  OF  TERMS  MATCHED:  SIGNIFICANT  VARIABLES 
FROM  COMPARISON  OF  INDEXING  AIDS 

Test  Conditions  (TC) 

TC/Sect ions 

FC/ Indexers/ Pa  tents 

TC/Sect ions /Pa tents 
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TABLE  8 

PERCENTAGE  OF  TERMS  MATCHED:  RESULTS  OF  TESTS,  AND  SUMMARY 


1  tern 

Base 

Zero 

Thesaurus 

Vocabulary 

Patent  Office 
Classification 

Avg.  %  terms  matched  per  patent 
section  by  any  pair  of 
i ndexers 

Bounded  . 

9.5 

8.2 

4l  .2 

40.6 

Unbounded  . 

8.8 

8.4 

35.2 

34.3 

Avg.  %  terms  matched  between 
indexer  pairs  per  section 
Bounded 

Indexers  1-2  . 

3.0 

4.4 

33.1 

38.1 

Indexers  1-3  . 

12.8 

10.5 

43.8 

37.9 

Indexers  2-3  . 

12.9 

9.6 

46.9 

45.8 

Unbounded 

Indexers  1-2  . 

5.7 

4.5 

30.5 

29.8 

Indexers  1-3  . 

5.9 

8.4 

35.5 

31.6 

Indexers  2-3  . 

14.7 

12.5 

39.7 

41  .6 

Standard  deviation  in  patents 
for  each  pair  of  indexers 

Indexers  1-2 . | 

6.5 

4.0 

! 

18.8 

18.8 

Indexers  1-3  . 

13.3 

6.1 

20.0 

17.6 

Indexers  2-3  . 

6.6 

6.1 

18.4 

25.1 

t 

Standard  deviation  in  patents 
per  section 

Bounded  . 

i 

13.0 

i 

6.2 

30.0 

29.6 

Unbounded  . 

6.3 

6.2 

18.2 

26.8 

Standard  deviation  in  patents 
per  tool  . 

-  - J 

15.5 

16.0 

12.5 

14.6 

Note:  No  analysis  was  performed  to  ascertain  the  average  number  of 
terms  matched  per  patent.  Since  identical  terms  sometimes  occur  in  both 
sections  of  a  given  patent,  it  is  not  possible  to  take  the  arithmetic 
mean  of  the  sections  as  the  average  number  of  terms  matched  per  patent. 
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(1)  Table  8  shows  that  a  striking  increase  in  the  percentage  of 
terms  matched  between  any  two  indexers  in  either  patent  section  occurred 
with  the  use  of  the  Vocabulary  and  Patent  Office  Classification  list. 
From  an  average  of  about  8  or  9  per  cent  of  matching  terms  in  each 
section  of  the  Base  Zero  Test  and  with  the  use  of  a  Thesaurus,  the 
percentage  of  matching  terms  increased  with  the  utilization  of  the  other 
two  aids  to  about  38  per  cent. 

(2)  Of  equal  importance  is  the  fact  that  the  Thesaurus,  as  an  aid 
to  indexers,  shows  no  significant  differences  in  any  of  the  variables 
examined  when  compared  to  the  Base  Zero  Test;  rather,  it  is  remarkably 
similar  to  the  latter  (cf_.  Table  6). 

(3)  As  was  true  in  the  Base  Zero  Test  and  with  all  the  indexing 
aids  tested,  a  larger  or  equal  percentage  of  matching  terms  was  recorded 
in  the  bounded  (specified)  section  of  the  patent  than  in  the  unbounded 
section.  The  variation  in  the  range  of  percentages  was  smaller  when  the 
Vocabulary  and  the  Patent  Office  Classification  aids  were  used;  this  was 
not  the  case  with  the  Base  Zero  and  Thesaurus  tests. 

(4)  Even  though  the  Vocabulary  and  Patent  Office  Classification 
aids  showed  the  highest  percentages  of  matching  terms,  they  were  the 
only  two  aids  which  also  showed  significant  differences  between  the 
indexer  pairs.  In  other  words,  only  with  these  aids  was  there,  on  the 
average,  substantial  indexer  inconsistency.  The  Vocabulary  and  Patent 
Office  Classification  aids  also  showed  the  largest  variation  in  the 
range  of  the  percentage  of  terms  matching  per  patent  between  indexers 
and  patent  sections. 
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Concl us  Ions 

The  experiment  established  that  the  alphabetical  and  hierarchical 
indexing  aids  record  a  significant  increase  in  the  percentage  of  matched 
terms.  The  associative  tool,  on  the  other  hand,  has  failed  to  Improve 
the  consistency  level  of  the  Base  Zero  Test. 

Term  Origin  Ana  lysis 

In  Phase  II,  the  indexers  were  asked  to  label  the  source  or  treat¬ 
ment  of  each  term  used  to  index  a  patent.  Five  classifications  of  "term 
origin"  were  established,  in  an  attempt  to  determine  the  role  of  the 
indexing  aid,  and  to  measure  the  amount  of  uncertainty  or  indecision  on 
the  part  of  the  indexer-.  These  five  categories  of  term  origin  classifi¬ 
cation  were:  (l)  original  term  found  in  indexing  aid  and  retained; 

(2)  original  term  not  found  in  indexing  aid  but  retained;  (3)  original 
term  rejected  upon  inspection  of  the  aid,  whether  found  or  not,  and  no 
new  term  substituted;  (4)  original  term  rejected  upon  i nspect ion  ,of  the 
aid,  and  a  new  term  substituted;  and  (5)  term  suggested  by,  and  adopted 
from  the  indexing  aid. 

Certain  limitations  to  this  part  of  the  Phase  II  program  became 
apparent  after  the  analysis  of  the  results.  First,  the  indexers  failed 
to  record  a  "term  origin"  classification  for  some  of  the  terms  chosen 
(and  hence  the  term  totals  in  Table  9  differ  from  the  totals  of  terms 
used  in  Appendices  1-4,  1-5,  and  1-6).  Second,  the  distributions  among 
the  categories  in  Table  9  showed  that  Indexer  I  was  in  wide  variance 
with  the  distributions  for  the  other  two  indexers.  In  a  post-experimental 
interview,  Indexers  2  and  3  stated  that  they  did  not  fully  understand  the 
instructions  for  the  term  origin  phase,  even  though  they  stated  that 
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they  had  prior  to  the  administration  of  the  test.  It  is  the  feeling  of 
the  authors  that  the  psychological  action  of  affirming  indecision,  as 
exemplified  by  categories  3-5,  was  powerful  enough  to  preclude  an 
objective  categorization  of  term  origins  by  these  two  indexers.  Hence, 
the  results  and  conclusions  stated  in  the  following  pages  must  be  inter¬ 
preted  in  the  light  of  these  findings. 

Resul ts 

The  results  of  the  distribution  of  the  number  of  terms  by  origin, 
indexer,  and  indexing  aid  are  presented  in  Table  9. 


TABLE  9 

DISTRIBUTION  OF  THE  ORIGIN  OF  TERMS  BY 
INDEXING  AID  AND  INDEXER 


Thesaurus 

Vocabulary 

Patent  Office 

Term  Origin 

Indexer 

i 

1 ndexer 

1 ndexer 

■ 

2 

3 

■ 

2 

3 

■ 

2 

3 

1.  Original  term  found 
and  retained 

638 

415 

577 

658 

831 

345 

313 

432 

2.  Original  term  not 

found  but  retained 

M7 

161 

! 

179 

2 

2 

284 

298 

j 

367 

3.  Original  term  rejected, 
no  new  term  used 

! 

59 

1 

! 

0  i 

2 

1 

42 

0 

0 

4.  Original  term  rejected, 
new  term  used 

51 

3 

0 

83 

8 

73 

0 

1 

5.  New  term  suggested  and 
adopted  by  aid 

44 

1 _ 

7 

0 

1 

15 

1 

6 

2 

0 
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they  had  prior  to  the  administration  of  the  test.  It  is  the  feeling  of 
the  authors  that  the  psychological  action  of  affirming  indecision,  as 
exemplified  by  categories  3-5,  was  powerful  enough  to  preclude  an 
objective  categorization  of  term  origins  by  these  two  indexers.  Hence, 
the  results  and  conclusions  stated  in  the  following  pages  must  be  inter¬ 
preted  in  the  light  of  these  findings. 

Resul ts 

The  results  of  the  distribution  of  the  number  of  terms  by  origin, 
indexer,  and  indexing  aid  are  presented  in  Table  9. 


TABLE  9 

DISTRIBUTION  OF  THE  ORIGIN  OF  TERMS  BY 
INDEXING  AID  AND  INDEXER 


Thesaurus 

Vocabulary 

Patent  Office 

Term  Origin 

Indexer 

1 ndexer 

1 

ndexer 

■ 

2 

3 

■ 

2 

3 

D 

2 

3 

1.  Original  term  found 
and  retained 

638 

415 

577 

658 

831 

345 

313 

432 

2.  Original  term  not 

found  but  retained 

117 

161 

179! 

2 

2 

284 

298 

367 

3.  Original  term  rejected, 
no  new  term  used 

59 

1 

0 

2 

1 

0 

0 

4.  Original  term  rejected, 
new  term  used 

51 

3 ' 

0 

cr* 

00 

8 

0 

1 

5.  New  term  suggested  and 
adopted  by  aid 

44 

7 

0 

i 

15 

1 

2 

0 

23 


This  table  illustrates  the  great  discrepancy  which  exists  between 
Indexer  1  and  the  other  two  indexers.  While  it  Is  not  experimental ly- 
proper  to  eliminate  two  test  subjects  because  they  did  not  produce  the 
expected  results,  neither  can  Indexer  I  be  eliminated.  If  Indexers  2 
and  3  are  considered  representative,  the  results  show  that  they  over¬ 
whelmingly  exhibited  no  indecision,  as  represented  by  categories  3-5, 
but  relied  upon  their  original  choice  of  an  indexing  term  as  the  final 
one.  Only  with  the  vocabulary  indexing  tool  did  they  exclude  their 
original  terms  if  the  latter  were  not  located  in  that  aid.  It  is  felt 
that  this  is  due  to  the  fact  that  they  were  more  familiar  with  the 
vocabulary  listing  and  hence  placed  more  faith  in  its  ability  to  affirm 
or  deny  their  original  selection  of  a  term. 

When,  however,  Indexer  1  is  considered  truly  representative  of  the 
influence  of  tools  upon  decisions  to  accept  or  reject  a  term,  or  even  to 
suggest  a  new  term,  then  an  entirely  different  analysis  of  these  tools 
is  possible.  We  feel  strongly  that  this  was  the  case,  hence  the  analysis 
of  this  distribution  is  considered  more  fully  below.  However,  it  must 
be  borne  in  mind  that  the  following  conclusions  are  based  upon  one 
indexer's  reactions  ^nd  not  the  majority's: 

(1)  There  is  a  rather  large  variation  in  Category  l--the  original 
term  found  and  retained  among  indexing  aids.  The  Vocabulary  rates 
highest;  the  Thesaurus  presented  more  difficulty  in  the  indexer's  loca¬ 
tion  of  a  term,  and  there  was  even  more  difficulty  with  the  Patent 
Office  Classification  manual.  Both  of  these  latter  tools  show  the 
second  largest  percentage  of  terms  being  characterized  by  the  indexer  as 
"original  term  not  found  but  retained." 

(2)  In  relatively  few  cases,  the  original  term  was  rejected  and 
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no  new  term  substituted  for  it;  this  happened  more  frequently  with  the 
Thesaurus  and  the  Patent  Office  Classification  manual  than  with  the 
Vocabulary.  If  only  Indexer  I  is  taken  to  indicate  the  trend,  this 
conclusion  is  more  apparent:  he  rejected  almost  7  per  cent  of  his  terms 
with  the  Thesaurus,  about  6  per  cent  with  the  Patent  Office  Classifica¬ 
tion  manual,  and  only  0.3  per  cent  with  the  Vocabulary. 

(3)  The  substitution  of  a  new  term  for  a  rejected  term  occurred 
more  frequently  with  the  Vocabulary  list  (and  the  Patent  Office  Classi¬ 
fication  manual)  than  with  the  Thesaurus  which  presented  more  references 
to  the  indexer.  Again,  looking  at  Indexer  1,  we  see  that  he  substituted 
almost  II  per  cent  of  the  terms  with  the  Vocabulary,  almost  10  per  cent 
with  the  Patent  Office  Classification  manual,  and  only  6  per  cent  with 
the  Thesaurus. 

(4)  The  Thesaurus  comes  to  the  fore  with  regard  to  the  percentage 
of  terms  suggested  by  the  indexing  aid,  with  Indexer  1  adopting  almost 

5  per  cent  of  his  terms,  as  compared  to  his  adoption  rates  of  2  per  cent 
and  0.8  per  cent  with  the  Vocabulary  and  Patent  Office  Classification 
lists,  respect ivel y . 

Conclusions 

Even  with  the  apparent  limitations  arising  from  this  type  of 
subjective  and  inherently  qualitative  test,  it  appears  that  there  is 
among  experienced  indexers  a  predominant  trend  to  retain  the'r  original 
selections  of  indexing  terms,  whether  they  find  them  in  an  indexing  aid 
or  not. 

However,  when  their  tendencies  toward  indecision  and  uncertainty 
are  evaluated,  there  are  variations  which  can  be  attributed  to  the 
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indexing  aids.  The  Thesaurus'  strongest  force  seems  to  lie  in  its 
ability  to  suggest  new  terms  to  the  Indexer;  the  Vocabulary  and  the 
Patent  Office  Classification  manual  appear  to  be  about  equal  in  their 
ability  to  cause  the  indexer  to  reject  his  original  term  (under  condi¬ 
tions  of  indecision)  and  substitute  another  one  for  it.  At  the  same 
time,  the  Vocabulary  is  least  helpful  when  an  indexer  rejects  a  term 
without  finding  a  new  one;  the  other  two  aids  offer  the  indexer  a  better 
possibility  for  term  substitution. 


CHAPTER  III 


PHASE  II:  CONCLUSIONS  AND  SUMMARY 

(1)  With  the  use  of  indexing  aids,  indexers  generate  on  the 
average  less  terms,  and  attain  a  higher  inter- i ndexer  consistency,  in 
the  bounded  section  of  documents  than  in  the  unspecified  section.  This 
trend  is  in  agreement  with  the  conclusions  from  Phase  I  experiments. 

(2)  With  respect  to  the  reliability  of  indexers,  Phase  II  tests 
show  a  significant  difference  between  the  results  of  unaided  Base  Zero 
indexing  and  those  recorded  with  the  use  of  two  indexing  aids,  and  in 
the  individual  effect  of  the  aids  used.  While  there  was  no  significant 
difference  in  the  effect  on  the  variables  examined  in  the  Base  Zero  and 
the  Thesaurus  tests,  the  Vocabulary  and  the  manual  of  Classification 
yielded  a  significantly  higher  i nter- i ndexer  consistency  when  compared 
with  the  Base  Zero  and  the  Thesaurus  tests. 

With  regard  to  the  three  indexing  aids  tested,  combined  results  of 
the  statistical  analyses  support  the  following  conclusions  and  their 
i nterpretation: 

The  alphabetical  Vocabula  ry  (the  Chemical  Patents  Coding  Manual  of 
Documentation  Incorporated)  produced  the  highest  average  consistency 
among  indexers  (about  38  per  cent  per  patent  section). 

This  conclusion  appears  to  reflect  the  inherent  character  of  an 
alphabetical  subject-authority  list,  edited  for  synonyms  and  other 
undesirable  terms,  and  containing  a  small  number  of  invariable  ("see" 
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and  "also  post  on")  instructions  but  no  variable  ("see  also"  and  "related 
term")  cross  references.  The  absence  of  variable  referrals  accounts  for 
the  established  fact  that  the  Vocabulary  did  not  give  indexers  much 
opportunity  for  term  substitution  even  when  they  decided  to  delete  an 
originally  selected  term.  The  Vocabulary  thus  acted  as  a  subject-term 
authority  list,  with  its  attendant  standardizing  influence  on  the 
indexing  language;  in  this  light,  the  fact  that  the  tool  generated  the 
largest  number  of  terms  per  patent  (even  though  the  difference  from  the 
other  aids  was  not  significant)  indicates  the  greatest  depth  of  subject 
analysis.  The  edited  structure  of  the  Vocabulary,  and  the  previous 
experience  of  two  indexers  in  using  it,  explain  why  indexers  only  rarely 
rejected  an  originally  selected  term,  and  why  they  were  able  to  locate 
more  terms  in  the  Vocabulary  than  in  the  other  indexing  aids. 

The  Patent  Office  Manual  of  Classification  produced  the  lowest 
number  of  terms  per  patent  (although  not  significantly  so);  with  its 
aid,  the  indexers  attained  the  second  highest  average  consistency,  only 
slightly  lower  than  with  the  Vocabulary. 

These  conclusions  may  be  interpreted  plausibly  as  follows:  as  a 
hierarchical  classif icatory  schedule,  the  Manual  differs  from  the 
Vocabulary  in  having  less  chemical  terms  (as  a  result  of  which  it 
presented  the  greatest  difficulty  of  the  three  tools  in  locating 
selected  terms),  and  in  providing  a  limited,  non-prescriptive  hierarchical 
term-selection  guidance.  Since  this  generic  display  of  relationships  is 
usually  unidirectional,  and  it  frequently  involves  only  one  level  of 
hierarchy,  the  suggestive  power  of  the  Manual  is  relatively  low.  The 
classification  schedule  then  also  acts  largely  as  an  authority  list, 
although  less  comprehensive  than  the  Vocabulary;  it  advises  the  indexer 
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whether  to  retain  or  drop  a  term,  and  it  suggests  to  him  a  small  number 
of  additional,  generically  related  terms;  in  this  manner,  the  Manual  is 
of  more  help  than  the  Vocabulary  when  the  indexer  wishes  to  find  a 
replacement  for  an  originally  selected  term.  The  fact  that  its  display 
of  a  small  number  of  variable  generic  relationships  among  terms  did  not 
appreciably  affect  the  high  i nter- i ndexe r  consistency  lends  basis  to  the 
untested  postulate  that  the  display  of  variable  generic  (vertical) 
relationships  in  indexing  aids  is  not  as  harmful  to  indexer  reliability 
as  that  of  the  variable  semantic  (horizontal)  relationships  among  terms. 

The  Chemical  Engineering  Thesaurus  did  not  lead  the  indexers  to 
use  an  appreciably  higher  number  of  terns  than  the  other  tools,  but  it 
did  produce  a  significant  variation  in  the  volume  of  indexing  per  patent 
regardless  of  section.  With  its  aid,  the  indexers  registered  the  lowest 
inter- indexer  consistency  level  in  the  whole  project  (about  8  per  cent 
per  patent  section),  slightly  below  that  of  the  unaided  Base  Zero  Test. 
The  Thesaurus  demonstrated  the  strongest  impact  in  its  ability  to 
suggest  new  terms;  at  the  same  time,  while  it  led  to  the  largest  number 
of  term  rejections,  it  was  weakest  in  offering  substitutions  for  the 
rejected  terms. 

An  explanation  of  these  trends  may  again  be  sought  in  the  character 
of  thesauri.  As  concept-association  aids,  they  contain  large  numbers  of 
variable  ("related  term,"  "see  also")  cross  references  which  are  to  be 
employed  according  to  each  indexer's  best  judgment;  in  permitting  this 
semantic  freedom  of  term  assignment,  thesauri  are  the  least  prescriptive 
and  authoritative  indexing  aids.  Even  though  the  A.I.Ch.E.  Thesaurus 
was  least  helpful  in  offering  indexers  substitutions  for  rejected  terms 
(which  indecision  may  be  a  result  of  the  indexers'  lack  of  familiarity 
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with  the  tool  as  well  as  a  characteristic  inherent  In  It),  the  probable 
net  result  of  their  term  replacements  was  the  introduction  of  a  greater 
variety  of  termsi  and  hence  a  greater  inter- indexer  inconsistency.  The 
Manual  of  Classification  and  the  Vocabulary,  on  the  other  hand,  which 
were  responsible  for  more  term  substitutions,  usually  presented  the 
indexers  with  a  single  possibility  of  term  replacement,  so  that  the 
final  effect  was  one  of  vocabulary  standardization. 

Phase  II  experiments  were  not  concerned  with  the  effect  of  indexing 
aids  upon  intra- i ndexer  consistency,  and  hence  no  conclusions  in  this 
respect  are  supported  by  the  statistical  data  obtained. 


CHAPTER  IV 


PROJECT  SUMMARY  AND  RECOMMENDATIONS 

The  results  of  the  investigations  under  Contract  AF30(602)-26l6 
support  two  general  conclusions: 

(1)  Reliability  of  indexing,  with  or  without  the  use  of  indexing 
aids,  is  higher  when  experienced  indexers  are  instructed  to  index  a 
specified,  bounded  portion  (or  portions)  of  documents.  If  this 
conclusion  also  applies  to  classes  of  documents  other  than  chemical 
patents,  it  follows  that  indexing  from  titles  or  abstracts,  or  from 
other  defined  portions  of  documents,  is  more  consistent  than  "random" 
indexing  from  the  entire  document.  The  present  study  did  not,  however, 
seek  to  compare  the  adequacy  of  subject  coverage  attained  by  either 
method  of  indexing. 

(2)  I nter- i ndexer  consistency  improves  significantly  with  the  use 
of  prescriptive  indexing  aids  which  contain  a  minimal  display  of  the 
variable  semantic  relationships  among  terms.  The  use  of  indexing  aids 
which  enlarge  the  indexers'  semantic  freedom  of  term  choice  is  detrimental 
to  indexing  reliability.  These  conclusions  imply  that  greater  consistency 
in  coordinate  indexing,  and  hence  an  improvement  in  the  quality  of  indexing, 
lie  in  the  direction  of  controlled  indexing  vocabularies  which  formalize 
the  relationships  of  terms  so  as  to  uniformly  and  invariably  prescribe 

the  choice  of  indexing  terms. 

The  above  conclusions  are  valid  for  the  conditions  of  the  experiment, 
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that  is,  for  the  Uniterm  system  of  coordinate  indexing,  for  the  document 
population  of  chemical  patents,  and  for  the  group  of  indexers  used. 
Assuming  the  validity  of  the  interpretation  of  the  project  results  in 
the  preceding  Chapter,  however,  they  also  apply  to  other  systems  of 
indexing  and  to  other  types  of  documents. 

Since  it  was  not  possible  in  this  experiment  to  obtain  unequivocal 
evidence  for  the  interpretation  of  the  test  results  (cf.  pp.  22-24),  the 
authors  believe  that  there  is  a  need  for  further  experimentation  in  a 
direction  which  would  test  the  validity  of  their  interpretations. 

Future  experiments  of  the  type  reported  here  should  also  attempt  to 
mitigate  the  subjective,  unavoidable  effect  of  previous  experience  of 
indexers  by  selecting  a  larger  number  of  test  subjects  having  a  varied 
indexing  experience  and  background. 

Consistency  among  indexers  is  desirable  on  the  assumption  (not 
Investigated  in  this  project)  that  it  improves  the  effectiveness  of 
information  retrieval;  this  assumption,  in  turn,  is  valid  if  the  terms 
selected  by  indexers  for  a  given  document  are  all  the  terms  properly 
descriptive  of  that  document,  and  if  they  fully  suffice  to  retrieve  it. 
(Without  this  condition,  indexers  might  conceivably  be  consistent  in  the 
selection  of  too  few,  or  improper,  terms,  yet  have  no  desirable  effect 
on  the  effectiveness  of  retrieval.)  The  relationships  between  indexing 
consistency  and  effectiveness  of  information  retrieval  appears  to  us  to 
be  a  valid  topic  for  future  investigation. 

Supposing,  further,  that  optimum  (100  per  cent)  reliability  is  not 
obtainable  at  the  input  (indexing)  end,  it  is  then  propitious  to 
investigate  the  possibility  of  overcoming,  at  least  in  part,  the 
remaining  disagreement  among  indexers  in  a  qiven  system  of  indexing  at 
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the  output  (searching)  end.  A  study  of  retrieval  reliability,  and  of 
the  effect  of  alphabetical,  classificatory,  and  concept-association 
tools  and  devices  (whose  utility,  or  lack  of  It,  may  differ  from  that  at 
the  Indexing  end)  may  contribute  further  to  the  optimization  of  informa¬ 
tion  storage  and  retrieval  systems. 


APPENDIX  I 


TEST  RESULTS 


NOTE:  The  Base  Zero  Test  was  conducted  on  75  patents.  In  order  to 
simplify  its  comparison  with  the  25-patent  experiments  of  Phase  II,  a 
random  sample  of  this  size  was  selected  from  the  original  Base  Zero  Test. 
The  sample  was  tested  for  significant  variation  from  the  means  and 
variances  in  the  original  Base  Zero  Test.  None  was  obtained;  hence,  it 
is  considered  that  these  25  patents  represent  the  same  population,  with 
the  same  mean  and  variance,  as  the  original  seventy-five. 
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NUMBER  OF  TERMS  USED:  BASE  ZERO 


Pa  tent 
No. 

Indexer  1 

Indexer  2 

Indexer  3 

Total 

Section  1 

Section  2 

Section  1 

Section  2 

Section  1 

Section  2 

1 

16 

27 

9 

19 

15 

21 

107 

2 

9 

40 

14 

23 

25 

43 

154 

3 

18 

15 

16 

4o 

22 

53 

164 

4 

2 

27 

2 

18 

4 

19 

72 

5 

19 

14 

10 

16 

18 

20 

97 

6 

11 

9 

10 

28 

8 

13 

79 

7 

6 

13 

4 

20 

8 

28 

79 

8 

16 

20 

19 

22 

19 

27 

123 

9 

1 1 

12 

10 

23 

19 

38 

113 

10 

6 

25 

9 

15 

14 

25 

94 

11 

13 

28 

22 

25 

23 

29 

140 

12 

21 

20 

18 

28 

28 

42 

157 

13 

11 

16 

9 

13 

17 

17 

83 

14 

4 

37 

14 

24 

15 

23 

117 

15 

20 

22 

13 

23 

18 

31 

127 

16 

6 

18 

12 

14 

17 

18 

85 

17 

17 

8 

14 

17 

17 

16 

89 

18 

7 

42 

1 1 

25 

12 

33 

130 

19 

10 

27 

8 

19 

16 

16 

96 

20 

7 

37 

21 

29 

24 

35 

153 

21 

16 

39 

20 

33 

19 

27 

154 

22 

14 

14 

7 

15 

17 

27 

94 

23 

9 

28 

15 

27 

22 

28 

129 

24 

6 

14 

8 

24 

17 

28 

97 

25 

2 

29 

9 

26 

8 

26 

100 

Total 

277 

581 

304 

566 

422 

683 

2833 
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NUMBER  OF  TERMS  USED:  THESAURUS 


Indexer  1 

Indexer  2 

Indexer  3 

Patent 

No. 

Section  1 

Section  2 

Section  2 

Section  1 

Section  2 

1 

24 

32 

14 

18 

17 

27 

2 

27 

26 

12 

22 

22 

42 

3 

40 

46 

35 

37 

33 

60 

4 

52 

41 

19 

25 

29 

35 

5 

21 

20 

10 

15 

13 

14 

6 

24 

39 

13 

25 

12 

29 

7 

8 

19 

7 

16 

7 

21 

8 

28 

41 

18 

28 

22 

30 

9 

34 

78 

31 

60 

26 

63 

10 

31 

33 

15 

19 

20 

26 

II 

23 

22 

7 

13 

12 

18 

12 

8 

20 

6 

17 

9 

31 

13 

21 

15 

24 

17 

23 

14 

18 

23 

1 1 

18 

13 

19 

15 

23 

27 

13 

29 

18 

35 

16 

27 

24 

15 

21 

18 

28 

17 

22 

23 

12 

19 

17 

32 

18 

16 

18 

7 

1 1 

7 

10 

19 

50 

50 

23 

25 

27 

37 

20 

16 

18 

9 

13 

8 

15 

21 

3 

16 

1 

1 1 

1 

9 

22 

45 

40 

16 

27 

26 

40 

23 

36 

31 

3 

25 

18 

30 

24 

21 

31 

13 

23 

10 

26 

25 

23 

42 

17 

25 

17 

38 

Total 

641 

773 

342 

566 

419 

CO 

oa 
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NUMBER  OF  TERMS  USED:  PATENT  OFFICE  CLASSIFICATION 


Patent 

No. 

‘  1  -  '  - 

Indexer  1 

Indexer  2 

Indexer  3 

Section  1 

Section  2 

Section  1 

Section  2 

1 

40 

38 

24 

25 

37 

2 

33 

41 

22 

25 

21 

43 

3 

12 

51 

6 

42 

14 

66 

4 

19 

25 

16 

22 

18 

20 

5 

27 

35 

29 

27 

32 

36 

6 

7 

17 

7 

14 

4 

18 

7 

24 

33 

15 

20 

18 

23 

8 

17 

17 

11 

12 

13 

14 

9 

12 

14 

8 

14 

18 

29 

10 

14 

28 

5 

24 

19 

29 

1 1 

5 

12 

8 

15 

9 

38 

12 

23 

32 

16 

19 

17 

28 

13 

16 

20 

14 

15 

16 

25 

14 

5 

29 

7 

21 

19 

32 

15 

10 

18 

11 

22 

7 

16 

16 

26 

33 

13 

25 

18 

26 

17 

23 

34 

17 

25 

18 

42 

18 

3 

26 

9 

27 

14 

28 

19 

40 

35 

18 

28 

23 

34 

20 

13 

17 

7 

1 1 

10 

15 

21 

18 

24 

16 

24 

15 

21 

22 

13 

43 

6 

39 

16 

62 

23 

33 

36 

8 

24 

12 

26 

24 

28 

34 

16 

34 

23 

40 

25 

16 

17 

13 

16 

13 

17 

Total 

477 

709 

322 

571 

412 

765 
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NUMBER  OF  TERMS  USED:  VOCABULARY 


Patent 

Indexer  1 

Indexer  2 

Indexer  3 

No. 

Section  1 

Section  2 

Section  1 

Section  2 

1 

29 

22 

30 

26 

46 

2 

18 

8 

29 

17 

33 

3 

29 

30 

10 

18 

23 

34 

4 

31 

41 

17 

29 

27 

51 

5 

13 

19 

8 

23 

9 

38 

6 

42 

50 

22 

33 

32 

50 

7 

38 

43 

20 

25 

31 

92 

8 

33 

19 

11 

12 

24 

23 

9 

25 

26 

13 

22 

20 

37 

10 

9 

26 

8 

34 

10 

53 

11 

16 

25 

7 

23 

9 

25 

12 

31 

39 

19 

31 

23 

44 

13 

27 

32 

18 

32 

19 

28 

14 

26 

26 

15 

26 

21 

41 

15 

33 

33 

14 

17 

17 

20 

16 

12 

20 

6 

10 

1 1 

13 

17 

24 

26 

14 

17 

32 

36 

18 

30 

37 

1 1 

19 

22 

37 

19 

26 

31 

20 

30 

28 

35 

20 

21 

35 

15 

29 

20 

37 

21 

32 

36 

12 

28 

18 

33 

22 

19 

43 

11 

21 

14 

31 

23 

20 

29 

10 

23 

14 

19 

24 

26 

38 

18 

34 

22 

37 

25 

5 

27 

4 

26 

7 

31 

Total 

615 

797 

333 

621 

496 

924 

I  -  7 


PERCENTAGE  OF  TERMS  MATCHED:  BASE  ZERO 


Patent 

' 

Indexers 

1  -  2 

_ 1 

Indexers 

1  -  3 

Indexers 

2  -  3 

No . 

Section  1 

Section  2 

Section  1 

Section  2 

Secti on  1 

l 

Section  2 

1 

0.0 

9.5 

14.7 

4.9 

9.1 

11.1 

2 

0.0 

3.3 

33.3 

18.4 

18.2 

6.5 

3 

0.0 

0.0 

7.9 

2.4 

18.8 

32.9 

4 

0.0 

54.5 

6.3 

0.0 

19.4 

5 

0.0 

8.7 

3.8 

3.7 

20.0 

6 

5.0 

8.1 

5.9 

28.1 

7 

0.0 

10.1 

17.1 

37.0 

8 

2.9 

8.3 

22.6 

16.7 

9 

0.0 

16.0 

11.5 

19.6 

10 

7.1 

5.3 

0.0 

21.1 

5.3 

11 

2.9 

8.2 

5.6 

25.0 

10.2 

12 

0.0 

2.1 

0.0 

9.5 

20.7 

13 

5.3 

11.5 

12.1 

18.2 

15.4 

14 

3.7 

8.9 

9.5 

2.5 

16,0 

6.8 

15 

3.1 

4.7 

0.0 

9.4 

3.3 

8.0 

16 

20.0 

0.0 

21.7 

11.9 

11.5 

14.3 

17 

3.3 

19.0 

5.0 

5.9 

6.9 

3.1 

18 

5.9 

11.1 

17.9 

6.7 

15.0 

13.7 

19 

0.0 

0.0 

6.9 

8.1 

9.1 

9.4 

20 

3.7 

8.2 

13.3 

9.1 

15.4 

8.5 

21 

5.9 

10.8 

13.3 

7.3 

11.4 

7.1 

22 

5.0 

3.6 

21.4 

0.0 

2.4 

23 

0.0 

5.8 

7.8 

32.1 

17.0 

24 

0.0 

0.0 

20.0 

13.6 

26.8 

25 

0.0 

7.8 

4.5  | 

6.3 

8.3 
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PERCENTAGE  OF  TERMS  MATCHED:  THESAURUS 


Patent 

No. 

Indexers 

1  -  2 

Indexers 

1  -  3 

Indexers 

2  -  3 

Section  1 

Section  2 

Section  1 

Section  2 

Section  1 

Section  2 

1 

14.3 

28.1 

25.5 

3.3 

2.3 

2 

2.1 

16.7 

13.3 

9.7 

20.8 

3 

3.8 

5.8 

9.3 

7.9 

6.5 

4 

6.5 

15.7 

7.0 

2.1 

9.1 

5 

0.0 

2.9 

3.0 

3.0 

21.0 

16.0 

6 

8.8 

3.2 

5.9 

19.3 

13.6 

10.2 

7 

0.0 

0.0 

0.0 

5.3 

7.7 

12.1 

8 

4.5 

4.5 

6.4 

4.4 

11.1 

5.5 

9 

1.6 

5.3 

5.3 

8.5 

18.8 

18.3 

10 

12.2 

8.3  ! 

18.6 

9.3 

2.9 

2.3 

11 

0.0 

0.0 

12.9 

8.1 

11.8 

19.2 

12 

0.0 

2.8 

13.3 

4.1 

7.1 

17.1 

13 

0.0 

5.6 

5.9 

14.3 

27.0 

14 

7.9 

10.7 

7.7 

14.3 

12.1 

15 

1.8 

13.9 

5.1 

10.7 

14.3 

16 

10.5 

12.5 

15.4 

13.0 

17.9 

22.5 

17 

6.3 

2.4 

11.4 

5.8 

11.8 

10,9 

18 

4.5 

3.6 

9.5 

7.7 

16.7 

16.,  7 

19 

4.3 

4.2 

8.5 

11.5 

8.7 

6.9 

20 

13.6 

6.9 

9.1 

6.4 

6.3 

3.7 

21 

0.0 

0.0 

0.0 

0.0 

0.0 

17. 6 

22 

3.4 

1.5 

10.9 

5.3 

10.5 

13.5 

23 

0.0 

3.7 

22.7 

1.7 

5.0 

14.6 

24 

9.7 

10.2 

10.7 

7.5 

4.5 

6.5 

25 

0.0 

3.1 

2.6 

14.3 

3.0 

6.8 

PERCENTAGE  OF  TERMS  MATCHED:  PATENT  OFFICE  CLASSIFICATION  MANUAL 


Patent 

No. 

— 

Indexers 

1  -  2 

Indexers 

1  -  3 

Indexers 

2  -  3 

Section  1 

Section  2 

Section  1 

Section  2 

Section  1 

Section  2 

1 

30.6 

47.7 

25.0 

40.0 

31.3 

2 

■ 

40.4 

38.5 

37.7 

53.6 

44.7 

3 

27.4 

36.8 

37.6 

25.0 

38.5 

4 

34.3 

42.3 

36.4 

31.7 

61.5 

5 

31.2 

47.5 

47.9 

69.4 

37.0 

6 

^HSKsSh 

10.7 

37.5 

20.7 

22.2 

28.0 

7 

44.4 

43.2 

44.8 

36.6 

65.0 

53.6 

8 

47.4 

38.1 

50.0 

40.9 

22.2 

52.9 

9 

42.9 

40.0 

57.9 

34.4 

36.8 

38.7 

10 

35.7 

33.3 

37.5 

35.7 

26.3 

55.9 

11 

62.5 

17.4 

40.0 

11.1 

70.0 

26.2 

12 

39.3 

24.4 

53.8 

33.3 

65.0 

42.4 

13 

30.4 

16.7 

28.0 

25.0 

36.4 

29.0 

14 

33.3 

13.6 

14.3 

17.3 

13.0 

26.2 

15 

61.5 

48.1 

41.7 

25.9 

50.0 

40.7 

16 

34.5 

48.7 

46.7 

47.6 

72.2 

54.5 

17 

33.3 

28.3 

28.1 

28.8 

59.1 

26.4 

18 

33.3 

39.5 

13.3 

31.7 

53.3 

61.8 

19 

28.9 

26.0 

40.0 

34.9 

32.3 

40.9 

20 

17.6 

12.0 

35.3 

28.0 

41.7 

36.8 

21 

47.8 

41.2 

43.5 

32.4 

40.9 

36.4 

22 

55.6 

18.8 

11.5 

20.7 

37.5 

31.2 

23 

17.1 

27.7 

36.4 

37.8 

33.3 

35.1 

24 

29.4 

25.9 

41.7 

32.1 

62.5 

45.1 

25 

26.1 

26.9 

31.8 

30.8 

85.7 

65.0 
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PERCENTAGE  OF  TERMS  MATCHED:  PATENT  OFFICE  CLASSIFICATION  MANUAL 


Patent 

No. 

1 

Indexers 

1-2 

i 

1 

Indexers 

1  -  3 

Indexers 

2  -  3 

Section  1 

1 _ 

Section  2 

Section  1 

Section  2 

Section  1 

Section  2 

1 

47.7 

25.0 

40.0 

31.3 

2 

38.5 

37.7 

53.6 

44.7 

3 

36.8 

37.6 

25.0 

38.5 

4 

34.3 

42.3 

36.4 

31.7 

61.5 

5 

31.2 

47.5 

47.9 

69.4 

37.0 

6 

10.7 

37.5 

20.7 

22.2 

28.0 

7 

44.4 

43.2 

44.8 

36.6 

65.0 

53.6 

8 

47.4 

38.1 

50.0 

40.9 

22.2 

52.9 

9 

42.9 

40.0 

57.9 

34.4 

36.8 

38.7 

10 

35.7 

33.3 

37.5 

35.7 

26.3 

55.9 

11 

62.5 

17.4 

40.0 

11.1 

70.0 

26.2 

12 

39.3 

24.4 

53.8 

33.3 

65.0 

42.4 

13 

30.4 

16.7  i 

28.0 

25.0 

36.4 

29.0 

14 

33.3 

13.6 

14.3 

17.3 

13.0 

26.2 

15 

61.5 

48.1 

41.7 

25.9 

50.0 

40.7 

16 

34.5 

48.7  1 

46.7 

47.6 

72.2 

54.5 

17 

33.3 

28.3 

28.1 

28.8 

59.1 

26.4 

18 

33.3 

39.5 

13.3 

31.7 

53.3 

61.8 

19 

28.9 

26.0 

40.0 

34.9 

32.3 

40.9 

20 

17.6 

12.0 

35.3 

28.0 

41.7 

36.8 

21 

47.8 

41.2 

43.5 

32.4 

40.9 

36.4 

22 

55.6 

18.8 

11.5 

20.7 

37.5 

31.2 

23 

17.1 

27.7 

36.4 

37.8 

33.3 

35.1 

24 

29.4 

25.9 

41.7 

32.1 

62.5 

45.1 

25 

26.1 

26.9 

31.8 

30.8 

85.7 

65.0 
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PERCENTAGE  OF  TERMS  MATCHED:  VOCABULARY 


Patent 

No. 

Indexers 

1  -  2 

Indexers 

1  -  3 

Indexers 

2  -  3 

Section  1 

Section  2 

Section  1 

Section  2 

Section  1 

Section  2 

1 

45.7 

32.0 

48.6 

36.6 

49.0 

2 

18.2 

28.3 

40.0 

43.2 

31.9 

3 

22.0 

26.3 

44.4 

39.1 

23.8 

4 

41.2 

42.8 

52.6 

41.1 

45.4 

5 

61.5 

45.4 

57.1 

18.7 

24.5 

6 

30.6 

33.9 

39.6 

42.8 

43.1 

7 

34.9 

25.9 

46.8 

48.3 

37.8 

19.4 

8 

12.8 

24.0 

32.6 

27.3 

25.0 

40.0 

9 

31.0 

29.7 

45.2 

31.3 

50.0 

28.3 

10 

30.8 

27.7 

26.7 

29.5 

63.6 

48.1 

11 

27.8 

45.5 

38.9 

51.5 

45.5 

54.8 

12 

51.5 

42.9 

68.8 

45.6 

75.0 

47.1 

13 

50.0 

33.3 

31.4 

25.0 

37.0 

36.4 

14 

28.1 

20.9 

42.4 

26.4 

44.0 

36.7 

15 

22.9 

22.0 

25.0 

23.3 

55.0 

48.0 

16 

38.5 

20.0 

26.9 

32.0 

54.5 

35.3 

17 

33.3 

22.9 

55.6 

51.2 

48.4 

35.9 

18 

41.4 

43.6 

62.5 

48.0 

43.5 

33.3 

19 

17.9 

22.0 

22.7 

24.5 

50.0 

51.2 

20 

18.9 

28.0 

73.9 

40.8 

25.0 

41 .9 

21 

11.1 

23.1 

26.9 

34.5 

25.0 

44.4 

22 

42.9 

30.0 

54.5 

37.1 

50.0 

40.0 

23 

19.3 

38.5 

54.8  i 

33.9 

60.0 

51.1 

24 

50.0 

15.2 

33.3 

19.1 

57.1 

42.5 

25 

44.0 

39.1 

43.8 

35.5 

46.9 

39.7 

appendix  II 


STATISTICAL  ANALYSES 


ANALYSIS  OF  VARIANCE  OF  NUMBER  OF  TERMS  USED:  BASE  ZERO 


1  tem 

Sums  of 

Squa  res 

df 

MSE 

S i gni f  i  cant? 

Indexer 

775.8534 

2 

387.9267 

Yes 

Section 

4,559.5267 

I 

4,559.5267 

Yes 

Patent 

3,117.24 

24 

132.3850 

Yes 

Indexer/Section 

24.0933 

2 

12.0466 

No 

Indexer/Patent 

695.48  I 

48 

14.4891 

No 

Section/Patent 

1,408.9733 

24 

58.7072 

Yes 

Res i dua i 

2,545.9067 

48 

53.0397 

Total 

13,187.0734 

149 

.  . 
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ANALYSIS  OF  VARIANCE  OF  NUMBER  OF  TERMS  USED:  THESAURUS 


Item 

Sums  of 
Squares 

df 

MSE 

Si gni fi cant? 

Indexer 

2,560.57 

2 

1,280.29 

No 

Seeti on 

3,037.50 

1 

3,037.50 

Yes 

Patent 

13,404.22 

24 

558.51 

Yes 

Indexer/Secti  on 

349.72 

2 

174.86 

Yes 

Indexer/Patent 

1,709.10 

48 

35.61 

Yes 

Secti on/Pitent 

1,905.00 

24 

79.38 

Yes 

Residual 

811.28 

48 

16.90 

Total 

23,777.39 

149 

*  • 

V  « 
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ANALYSIS  OF  VARIANCE  OF  NUMBER  OF  TERMS  USED: 
PATENT  OFFICE  CUSS  IFICAT  ION  MANUAL 


Item 

Sums  of 
Squares 

df 

MSE 

Si gni fi cant  ? 

Indexer 

1,110.5734 

2 

555.2867 

No 

Section 

4,637.0400 

1 

4,637.0400 

Yes 

Patent 

6,502.4267 

24 

270.9344 

No 

Indexer/Secti  on 

171,64 

2 

85.8200 

Yes 

Indexer/Patent 

1,526.0933 

48 

31.7936 

No 

Secti on/Patent 

3,412.2933 

24 

142.1788 

Yes 

Resi dual 

1,013.0267 

48 

21.1047 

Total 

18,373.0934 

149 

•  • 
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ANALYSIS  OF  VARIANCE  OF  NUMBER  OF  TERMS  USED:  VOCABULARY 


I  tern 

Sums  of 
Squares 

df 

MSE 

Si gni f i cant? 

I ndexer 

2,846.56 

2 

1,423.28 

No 

Patent 

6,209.0266 

24 

258.7094 

No 

Section 

5,376.0266 

1 

5,376.0266 

Yes 

I ndexer/Patent 

2,539.7734 

48 

52.9119 

Yes 

Secti on/Patent 

1,828.9734 

24 

76.2072 

Yes 

I ndexer /Secti  on 

609.0134 

2 

304.5067 

Yes 

Resi dual 

1,395.9866 

48 

29.083 

Total 

20,805.359 

149 

•  • 

•  • 
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ANALYSIS  OF  VARIANCE  OF  NUMBER  OF  TERMS  USED: 
BASE  ZERO  AND  THESAURUS 


Item 

Sums  of 
Squares 

df 

MSE 

'■■■ 

Significant? 

Test  Conditions  (TC) 

1,391.05 

1 

No 

Indexers  (I  ) 

1,549.41 

2 

774.71 

No 

Sections  (S  ) 

7,520.01 

1 

7,520.01 

No 

Patents  (P) 

7,395.19 

24 

308.13 

Yes 

TC/l 

1,787.02 

2 

893.51  ] 

No 

TC/S 

77.02 

1 

72.02 

No 

TC/P 

9,168.11 

24 

382.00  1 

No 

i/s 

152.14 

2 

76.07 

Yes 

i/P 

2,053.42 

48 

42.77 

No 

s/p 

1,802.99 

24 

75.12 

No 

TC/l/S 

221.67 

2 

110.84 

Yes 

TC/l/p 

1,679.32 

48 

34.99 

Yes 

TC/S/P 

1,529.12 

24 

63.71 

Yes 

i/s/p 

1,442.72 

48 

30. 06 

Yes 

Residual 

586.33 

48 

12.22 

Total 

38,355.52 

299 

•  • 

•  • 
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ANALYSIS  OF  VARIANCE  OF  NUMBER  OF  TERMS  USED:  BASE  ZERO  AND 
PATENT  OFFICE  CLASSIFICATION  MANUAL 


Item 

Sums  of 
Squares 

df 

MSE 

Significant? 

Test  Conditions  (TC) 

596.4300 

1 

596.43 

No 

Indexers  (I ) 

1,349.8867 

2 

674.9433 

No 

Sections  (S ) 

9,196.4033 

1 

9,196.4033 

Yes 

Patents  (P) 

4,736.18 

24 

197.3408 

No 

TC/l 

536.54 

2 

268.2700 

No 

TC/S 

0.1634 

1 

0.1634 

No 

TC/P 

4,943.4867 

24 

205.9786 

Yes 

i/s 

57,7267 

2 

28.8633 

No 

I/P 

2,161.28 

48 

45.0266 

No 

s/p 

2,745.8467 

24 

114.4102 

No 

TC/l/S 

277.8066 

2 

138.9033 

Yes 

TC/l/P 

60.2933 

48 

1.2561 

No 

TC/S/P 

2,075.4199 

24 

86.4758 

Yes 

i/s/p 

1,214.7733 

48 

25.3077 

No 

Residual 

2,204.3601 

48 

45.9241 

Total 

32,156.5967 

299 

•  • 

•  • 
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ANALYSIS  OF  VARIANCE  OF  NUMBER  OF  TERMS  USED: 
BASE  ZERO  AND  VOCABULARY 


Item 

Sums  of 
Squares 

df 

MSE 

Significant? 

Test  Conditions  (TC) 

3,027.3633 

1 

3,027.3633 

No 

Indexers  (I ) 

2,517.8067 

2 

1,258.9033 

No 

Sections  (S  ) 

9,918.7500 

1 

9,918.7500 

Yes 

Patents  (P) 

3,691.8800 

24 

153.8283 

No 

TC/l 

1,104.6067 

2 

552.3033 

No 

TC/S 

16.8034 

1 

16.8034 

No 

TC/P 

5,694.3867 

24 

237.2661 

No 

i/s 

215.4200 

2 

107.7100 

No 

l/P 

1,882.86 

48 

39.2262 

No 

s/p 

2,262.3333 

24 

94.2638 

Yes 

TC/l/S 

417.6866 

2 

208.8433 

Yes 

TC/l/P 

1,352.3933 

48 

28.1748 

No 

TC/S/P 

1,058.9466 

24 

44.1227 

No 

i/s/p 

1,709.2467 

48 

35.6093 

No 

Residual 

2,149.3134 

48 

44.7773 

Total 

37,019.7967 

299 

•  • 

•  • 
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ANALYSIS  OF  VARIANCE  OF  NUMBER  OF  TERMS  USED:  THESAURUS, 
PATENT  OFFICE  CLASSIFICATION  MANUAL,  AND  VOCABULARY 


I  tern 

Sums  of 
Squares 

df 

MSE 

Si  gni ficant? 

Test  Conditions  (TC) 

944.1733 

2 

472.0866 

No 

Indexers  (I ) 

5,876.92 

2 

2,938.4600 

No 

Sections  (S  ) 

12,874.7755 

1 

12,874.7755 

Yes 

Patents  (P) 

9,044.6311 

24 

376.8596 

No 

TC/l 

640.7867 

4 

160.1966 

res 

TC/S 

175.7912 

2 

87.8956 

No 

TC/P 

17,071.0489 

48 

355.6468 

Yes 

I/S 

1,040.1378 

2 

520.0689 

Yes 

I/P 

2,008.3022 

48 

41.8396 

No 

S/P 

2,203.5022 

24 

91.8125 

No 

TC/l/S 

90.2355 

4 

22.5588 

No 

TC/l/p 

3,766.6578 

56 

39.2360 

Yes 

TC/S/P 

4,942.7644 

48 

102.9742 

Yes 

i/s/p 

792.4178 

48 

16.5087 

No 

Residual 

2,427.8756 

96 

25.2903 

Total 

63,900.0200 

449 

•  • 

•  • 
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ANALYSIS  OF  VARIANCE  OF  PERCENTAGE  OF  TERMS  MATCHED:  BASE  ZERO 


Item 

Sums  of 
Squares 

df 

MSE 

Significant? 

Indexer 

2,238.9762 

2 

1,119.4881 

No 

Secti on 

22.3^94 

1 

22.3494 

No 

Patent 

958.5900 

24 

39.9412 

No 

Indexer/Section 

722.8468 

2 

361.4234 

Yes 

Indexer/Patent 

3,648.4172 

48 

76.0086 

Yes 

Secti on/Patent 

720.8690 

24 

30.0362 

No 

Resi dual 

2,821.0998 

48 

58.7708 

Total 

11,132.2484 

149 

•  • 

•  « 

II  -  12 


ANALYSIS  OF  VARIANCE  OF  PERCENTAGE  OF  TERMS  MATCHED:  THESAURUS 


Item 

Sums  of 
Squares 

df 

MSE 

Significant? 

Indexer  Pairs 

1,191.24 

2 

595.62 

No 

Section 

2.44 

1 

2.44 

No 

Patent 

1,024.43 

24 

42.68 

Yes 

Indexers/Section 

158.36 

2 

79.18 

Yes 

Indexers/Patent 

2,277.09 

48 

447.44 

Yes 

Section/Patent 

265.71 

24 

11.07 

No 

Residual 

1,362.34 

48 

28.38 

Total 

6,281.61 

149 

*  * 

•  • 

II  -  13 


ANALYSIS  OF  VARIANCE  OF  PERCENTAGE  OF  TERMS  MATCHED: 
PATENT  OFFICE  CLASSIFICATION  MANUAL 


I  tern 

Sums  of 
Squares 

df 

MSE 

Significant? 

Indexer 

2,940.5488 

2 

1,470.2744 

Yes 

Section 

1,467.0320 

1 

1,467.0320 

Yes 

Patent 

8,573.1276 

24 

357.2136 

No 

Indexer/Secti  on 

104.4486 

2 

52.2243 

No 

Indexer/Patent 

6,950.5012 

48 

144.8021 

Yes 

Secti on/Patent 

4,186.7596 

24 

174.4483 

Yes 

Residual 

4,051.8748 

48 

84.4140 

Total 

28,274.2916 

149 

a  « 

u  . . .  . . 

•  • 
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ANALYSIS  OF  VARIANCE  OF  PERCENTAGE  OF  TERMS  MATCHED:  VOCABULARY 


Item 

Sums  of 
Squares 

df 

MSE 

Significant? 

Indexer  Pairs 

3,445.7942 

2 

1,722.8971 

Yes 

Sect!  on 

1,362.6294 

1 

1,362.6294 

Yes 

Patent 

6,275.21 18 

24 

261.4671 

No 

Indexer s/Sect Ion 

237.3424 

2 

118.6712 

No 

Indexers/Patent 

6,833.8958 

48 

142.3728 

Yes 

Section/patent 

3,542.2789 

24 

147.5949 

Yes 

Residual 

2,662.7343 

48 

55.4736 

Total 
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ANALYSIS  OF  VARIANCE  OF  PERCENTAGE  OF  TERMS  MATCHED: 
BASE  ZERO  AND  THESAURUS 


Item 

Sums  of 
Squares 

df 

MSE 

Si gni f i cant? 

Test  Conditions  (TC) 

53.43 

1 

53.43 

No 

Indexers  (I  ) 

3,297.11 

2 

1,698.56 

No 

Sections  (S ) 

5.02 

1 

5.02 

No 

Patents  (P) 

1,234.11 

24 

51.44 

No 

Tc/l 

131.10 

2 

65.55 

No 

TC/S 

19.76 

1 

19.76 

No 

TC/P 

748.90 

24 

31.20 

No 

i/s 

703.72 

2 

351.86 

Yes 

I/P 

3,212.96 

48 

66.94 

Yes 

s/p 

531.22 

24 

22.13 

No 

TC/l/S 

179.50 

2 

89.75 

No 

TC/l/P 

2,725.83 

48 

56.79 

No 

TC/S/P 

455.37 

24 

18.97 

No 

i/s/p 

1,611.59 

48 

33.57 

No 

Residual 

2,557.66 

48 

53.28 

Iota  1 

17,467.28 

299 

•  • 

•  • 
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ANALYSIS  OF  VARIANCE  OF  PERCENTAGE  OF  TERMS  MATCHED: 
BASE  ZERO  ANO  PATENT  OFFICE  CLASSIFICATION  MANUAL 


Item 

Sums  of 
Squares 

df 

MSE 

Significant? 

Test  Conditions  (TC) 

60,055.4305 

1 

60,055.4305 

Yes 

Indexers  (I) 

4,862.1921 

2 

2,431,0960 

Yes 

Sections  (S) 

925.7633 

1 

925.7633 

No 

Patents  (P) 

5,192.0465 

24 

216.3352 

No 

TC/l 

317.3329 

2 

158.6664 

No 

TC/S 

563.6182 

1 

563.6182 

Yes 

TC/P 

4,339.6711 

24 

180.8196 

No 

i/s 

390.7521 

2 

195.3760 

No 

I/P 

5,105.1979 

48 

106.3582 

No 

s/p 

1,844.6883 

24 

76.8620 

No 

TC/l/S 

436.5432 

2 

218.2716 

No 

TC/l/P 

5,493.7205 

48 

114.4525 

Yes 

TC/S/P 

3,062.9402 

24 

127.6225 

Yes 

i/s/p 

3,589.9413 

48 

74.7904 

No 

Residual 

3,282.1334 

48 

68.3777 

Total 

99,461.9715 

299 

■  • 

•  • 
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ANALYSIS  OF  VARIANCE  OF  PERCENTAGE  OF  TERMS  MATCHED: 
BASE  ZERO  AND  VOCABULARY 


Item 

Sums  of 
Squares 

df 

MSE 

Significant? 

Test  Conditions  (TC) 

63,388.5888 

63,388.5888 

Yes 

Indexers  (I  ) 

5,578.3553 

2,789.1776 

No 

Sections  (S) 

867.0000 

1 

867.000 

No 

Patents  (P) 

3,308.1372 

24 

137.8390 

No 

TC/l 

106.4150 

2 

53.2075 

No 

TC/S 

517.9788 

1 

517.9788 

Yes 

TC/P 

3,925.6645 

24 

163.5693 

No 

i/s 

775.0854 

2 

387.5427 

Yes 

I/P 

5,398.9714 

48 

112.4785 

No 

s/p 

1,647.6400 

24 

68.6516 

No 

TC/l/S 

185.1038 

2 

92.5519 

No 

TC/l/p 

5,083.3417 

48 

105.9029 

Yes 

TC/S/P 

2,615.5079 

24 

108.9794 

Yes 

i/s/p 

2,469.4046 

48 

51.4459 

No 

Residual 

3,013.5295 

48 

62.7818 

Total 

98,880.7239 

299 

•  • 

•  • 
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ANALYSIS  OF  VARIANCE  OF  PERCENTAGE  OF  TERMS  MATCHED:  THESAURUS, 
PATENT  OFFICE  CUSS IFICAT ION  MANUAL,  AND  VOCABULARY 


Item 

Sums  of 
Squares 

df 

MSE 

Significant? 

Test  Conditions  (TC) 

87,239.2237 

2 

43,619.6118 

Yes 

Indexers  ( I ) 

6,480.2857 

2 

3,240.1428 

Yes 

Sections  (S) 

1,808.4098 

1 

1,808.4098 

No 

Patents  (P) 

5,500.1673 

24 

229.1736 

No 

TC/l 

1,097.2932 

4 

274.3233 

No 

TC/S 

1,023.6875 

2 

511.8437 

Yes 

TC/P 

10,372.5980 

48 

216.0957 

No 

i/s 

149.7681 

2 

74.8840 

No 

I/P 

3,730.0010 

48 

77.7083 

No 

s/p 

1,910.0367 

24 

79.5848 

No 

TC/l/S 

350.3882 

4 

87.5970 

No 

TC/l/P 

12,291.4951 

96 

128.0364 

Yes 

TC/S/P 

6,084.8400 

48 

126.7675 

Yes 

i/s/p 

3,286.0604 

48 

68.4595 

No 

Residual 

4,830.7543 

96 

50.3203 

Total 

146,155.0090 

449 

•  • 

•  • 

APPENDIX  III 


INDEXING  INSTRUCTIONS,  AF30 (602) - 26 1 6 ,  PHASE  II 


1.  Phase  II  Tests  are  designed  to  measure  the  consistency  of  term 
assignment  among  indexers  using  various  indexing  aids.  During  the 
tests,  the  Indexers  should  not  communicate  with  each  other  regarding 
the  indexing  of  any  document. 

2.  The  first  test  (of  three  scheduled)  will  utilize  the  Documentation 
Incorporated* Chemical  Patents  Coding  Manual. 

3.  Each  indexer  will  Index  a  total  of  50  chemical  patents  according  to 
the  familiar  rules  of  coordinate  indexing,  and  will  record  freely 
assigned  terms  on  the  tracing  card.  Simultaneously,  or  subsequently, 
the  indexer  will  consult  the  indexing  tools  for  each  term  on  the 
tracing  card. 

4.  The  indexing  too)  is  to  be  used  as  an  aid,  i.e.,  indexers  are  free 
to  adopt  or  reject  any  of  the  terms  contained  in  the  Manual.  They 
will  indicate,  next  to  each  term  on  the  tracing  card,  one  of  the 
following  alternatives: 

(a)  Original  term,  also  found  in  the  aid,  was  retained; 

(b)  Original  term,  not  found  in  the  aid,  was  retained; 

(c)  Original  term  was  rejected  upon  inspection  of  the  aid 
(irrespective  of  whether  the  term  was  found  in  the  aid),  and  no 
new  term  selected; 

(d)  Term  adopted  from  the  aid  instead  of  a  term  assigned  originally; 

(e)  Additional  term  adopted  from  the  aid. 

5.  The  title  and  all  claims  of  each  patent  must  be  indexed.  The  remainder 
of  the  patent  should  be  Indexed  according  to  the  judgment  of  each 
indexer  of  what  is  appropriate  to  describe  the  document. 
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